33 research outputs found

    Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes

    Get PDF
    Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 ďż˝ ligand interface Ca root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods

    Efficient maintenance and update of nonbonded lists in macromolecular simulations

    Get PDF
    Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the “nonbonded neighborhood lists” or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists

    Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes

    Get PDF
    We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.Published versio

    Improved prediction of MHC-peptide binding using protein language models

    Get PDF
    Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art

    Control variate technique: A constructive approach

    No full text
    The technique of control variates requires that the user iden-tify a set of variates that are correlated with the estimation variable and whose means are known to the user. We relax the known mean requirement and instead assume the means are to be estimated. We argue that this strategy can be beneficial in parametric studies, analyze the properties of controlled estimators, and propose a class of generic and effective controls in a parametric estimation setting. We discuss the effectiveness of the estimators via analysis and simulation experiments.

    Global Optimization: Partitioned Random Search and Optimal Index Policies

    No full text
    We consider a combination of state space partitioning and random search methods for solving deterministic global optimization problem. We assume that function computations are costly and finding global optimum is difficult. Therefore, we may decide to stop searching long before we found a solution close to the optimum. Final reward of the algorithm is defined as the best found function value minus total cost of computations. We construct index sampling policy that is asymptotically optimal on average when number of the search regions k is large. Sampling index for each search region is defined as the stopping value of sampling from that region only. Stopping value selection policy is an improvement over myopic and heuristic index rules that are used in partitioned random search and stochastic branch-andbound algorithms. 1 Introduction We consider a problem of global optimization of a function f(x) : A ! R where A ae R d (d 1). Finding the true global optimum x = arg max x2A ff(..

    Application of Selection Rules to Statistical Global Optimization

    No full text
    We consider global optimization methods that are based on statistical modeling of the objective function: at each step of the optimization algorithm distribution of possible function values is defined for all candidate points and the next sampling point is selected in order to optimize average performance of the algorithm. It is difficult to estimate total effect of sampling at a given point on algorithm performance. Therefore, the standard approach is to sample the point that maximizes a certain one-step utility function u(x). Because u(x) does not take into account effects of future sampling, utility function is modified based on heuristic considerations in order to make search more global. We suggest to consider optimization criterion that includes the cost of computations in the objective function. This setting allows us to compute utility function that maximizes total expected reward of sampling directly. Assuming non-adaptive model of the objective function, optimal utility funct..

    Competing Intelligent Search Agents in Global Optimization

    No full text
    this paper we present a new search methodology that we view as a development of intelligent agent approach to the analysis of complex system. The main idea is to consider search process as a competition mechanism between concurrent adaptive intelligent agents. Agents cooperate in achieving a common search goal and at the same time compete with each other for computational resources. We propose a statistical selection approach to resource allocation between agents that leads to simple and efficient on average index allocation policies. We use global optimization as the most general setting that encompasses many types of search problems, and show how proposed selection policies can be used to improve and combine various global optimization methods. This work opens a way to developing effective numerical procedures that reflect qualitative knowledge absorbed by intelligent search architectures developed for particular applications. We discuss examples in the areas of manufacturing control and scheduling, optimization via simulation, classification, data mining, and multitarget tracking. We propose designing a new software package that will consist of the database of heuristic search methods for particular problems and a control engine that will utilize statistical procedures to distribute computational resources between different methods. We describe organization of the competing search processes in section 2. We analyze global optimization models in section 3 and list applications of the competing search methodologies in section 4
    corecore